PSO Based Optimized Reliability for Robust Multimodal Speaker Identification

نویسندگان

  • Md. Tariquzzaman
  • Jin Young Kim
  • Seung You Na
چکیده

Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function’s optimizing factors to account optimum reliability simultaneously in audio, face and lip information. The proposed reliability measure is applied to a multimodal speaker identification framework for robust speaker identification. Particle swarm optimization (PSO) algorithm has been employed to optimize the modified convection function’s optimizing factors. In the face-based expert, the image quality has been degraded with jpeg compression technique in enrollment and test session. Similarly, Lip-based expert’s image quality also degraded to create mismatch in enrollment and test image. Finally, an artificial illumination in opposite direction has been added to test face and lip image with different intensities, respectively. The VidTimit audio DB was collected in office environment has a high level of signal distortion. We have applied local principal component analysis (Local PCA) to both face and lip modalities for reducing the dimension of features vector. The overall speaker identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimum reliability measures effectively enhanced the identification rate (IR) of 8.67% in comparison with the best classifier system i.e., audio classifier and most notably retained the consistency of multimodal integration framework. Key-Words: Speaker identification, Human Computer Interface, Particle Swarm Optimization, local PCA, Optimum Reliability Measures

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Speaker Identification using Adaptive Decision Fusion with Reliability Weighted Summation

We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, the so called product rule with a novel adaptive reliability based weighting structure is employed. The proposed adaptive product rule is more robust in the presence of unreliable modalities, provided that the employed r...

متن کامل

Adaptive classifier cascade for multimodal speaker identification

We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each mo...

متن کامل

Missing Reliability Correction in Modality Information Integration for Robust Speaker Identification

In the emerging biometrics technology, speaker identification in real environment is one of the key issues for enhancing the density of human computer interaction. In this paper, we propose an optimizing factor through a fuzzy membership function for correcting the reliability in different modalities reliability measure in a bimodal fusion process for speaker identification. In the bimodal spea...

متن کامل

Selective Regenerated Particle Swarm Optimization for Multimodal Function

This article proposes an improved particle swarm optimization (PSO) with suggested parameter setting “Selective Particle Regeneration”. To evaluate its reliability and efficiency, this approach is applied to multimodal function optimizing tasks. 12 benchmark functions were tested, and results are compared with those of PSO and GA-PSO. It shows the proposed method is both robust and suitable for...

متن کامل

Chapter 16 JOINT AUDIO - VIDEO PROCESSING FOR ROBUST BIOMETRIC SPEAKER IDENTIFICATION IN CAR 1

In this chapter, we present our recent results on the multilevel Bayesian decision fusion scheme for multimodal audio-visual speaker identification problem. The objective is to improve the recognition performance over conventional decision fusion schemes. The proposed system decomposes the information existing in a video stream into three components: speech, lip trace and face texture. Lip trac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010